54 research outputs found

    A Style-Based Generator Architecture for Generative Adversarial Networks

    Full text link
    We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.Comment: CVPR 2019 final versio

    StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

    Full text link
    Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward pass. They are thus much faster, but they currently remain far behind the state-of-the-art in large-scale text-to-image synthesis. This paper aims to identify the necessary steps to regain competitiveness. Our proposed model, StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable variation vs. text alignment tradeoff. StyleGAN-T significantly improves over previous GANs and outperforms distilled diffusion models - the previous state-of-the-art in fast text-to-image synthesis - in terms of sample quality and speed.Comment: Project page: https://sites.google.com/view/stylegan-t

    COEGAN: Evaluating the Coevolution Effect in Generative Adversarial Networks

    Full text link
    Generative adversarial networks (GAN) present state-of-the-art results in the generation of samples following the distribution of the input dataset. However, GANs are difficult to train, and several aspects of the model should be previously designed by hand. Neuroevolution is a well-known technique used to provide the automatic design of network architectures which was recently expanded to deep neural networks. COEGAN is a model that uses neuroevolution and coevolution in the GAN training algorithm to provide a more stable training method and the automatic design of neural network architectures. COEGAN makes use of the adversarial aspect of the GAN components to implement coevolutionary strategies in the training algorithm. Our proposal was evaluated in the Fashion-MNIST and MNIST dataset. We compare our results with a baseline based on DCGAN and also with results from a random search algorithm. We show that our method is able to discover efficient architectures in the Fashion-MNIST and MNIST datasets. The results also suggest that COEGAN can be used as a training algorithm for GANs to avoid common issues, such as the mode collapse problem.Comment: Published in GECCO 2019. arXiv admin note: text overlap with arXiv:1912.0617

    Generative Novel View Synthesis with 3D-Aware Diffusion Models

    Full text link
    We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume. This latent feature field captures the distribution over possible scene representations and improves our method's ability to generate view-consistent novel renderings. In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences. We demonstrate state-of-the-art results on synthetic renderings and room-scale scenes; we also show compelling results for challenging, real-world objects.Comment: Project page: https://nvlabs.github.io/genv
    • …